A pp ea rs in A C L ’ 94 Similarity - Based Estimation of WordCooccurrence Probabilities

نویسندگان

  • Ido Dagan
  • Fernando Pereira
  • Lillian Lee
چکیده

In many applications of natural language processing it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations \eat a peach" and \eat a beach" is more likely. Statistical NLP methods determine the likelihood of a word combination according to its frequency in a training corpus. However, the nature of language is such that many word combinations are infrequent and do not occur in a given corpus. In this work we propose a method for estimating the probability of such previously unseen word combinations using available information on \most similar" words. We describe a probabilistic word association model based on distribu-tional word similarity, and apply it to improving probability estimates for unseen word bigrams in a variant of Katz's back-oo model. The similarity-based method yields a 20% perplexity improvement in the prediction of unseen bigrams and statistically signiicant reductions in speech-recognition error.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exact maximum coverage probabilities of confidence intervals with increasing bounds for Poisson distribution mean

 ‎A Poisson distribution is well used as a standard model for analyzing count data‎. ‎So the Poisson distribution parameter estimation is widely applied in practice‎. ‎Providing accurate confidence intervals for the discrete distribution parameters is very difficult‎. ‎So far‎, ‎many asymptotic confidence intervals for the mean of Poisson distribution is provided‎. ‎It is known that the coverag...

متن کامل

Development of an Index-based Regression Model for Soil Moisture Estimation Using MODIS Imageries by Considering Soil Texture Effects

Soil moisture content (SMC) is one of the most significant variables in drought assessment and climate change. Near-real time and accurate monitoring of this quantity by means of remote sensing (RS) is a useful strategy at regional scales. So far, various methods for the SMC estimation using a RS data have been developed. The use of spectral information based on a small range of electromagnetic...

متن کامل

An ontological hybrid recommender system for dealing with cold start problem

Recommender Systems ( ) are expected to suggest the accurate goods to the consumers. Cold start is the most important challenge for RSs. Recent hybrid s combine  and . We introduce an ontological hybrid RS where the ontology has been employed in its  part while improving the ontology structure by its  part. In this paper, a new hybrid approach is proposed based on the combination of demog...

متن کامل

Correlation between IP and Rs and grade data in modeling and evaluation of a copper deposit, case study: the Sarbisheh copper deposit, Iran

This paper addresses the application of integrated chargeability and resistivity method and grade data in modeling and evaluation of copper deposits. We argue that the relationship between IP, Rs and grade data may be used for modeling and reserve estimation and tested this argument for Sarbisheh copper deposit that is located in eastern Iran. Geology and mineralization situation of Sarbisheh d...

متن کامل

توسعه و ارزیابی مدل‌های تخمین تابش خورشیدی بر اساس ساعات آفتابی و اطلاعات هواشناسی

Global solar radiation (Rs) has wide applications in several disciplines. The data of measured or predicted Rs are widely applied by solar engineers, architects, agriculturists and hydrologists. Due to the importance of Rs, several empirical models have been developed to predict its values all over the world. In this study, Angstrom model was calibrated based on the ratio of actual and possible...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994